Sentinel LoRA training + RAG budget fix + fallback elimination#270
Sentinel LoRA training + RAG budget fix + fallback elimination#270
Conversation
- genome/dataset-prepare: extract chat history into JSONL training data
- genome/train: PEFT LoRA training with Python subprocess, returns adapter
- genome/training-pipeline: Sentinel pipeline orchestrating prepare→train→register→activate
- AdapterPackage: manifest.json packaging with SHA-256 content hashing, size calculation
- GenomeLayerEntity persistence: genome/train creates database record after training
- genome/paging-adapter-register: accepts layerId to hydrate from persisted entity
- LoRATrainingPipeline wires layerId via {{steps.1.0.data.layerId}} interpolation
- 74 unit tests (11 new for AdapterPackage + pipeline layerId wiring)
Entities: AcademySession, AcademyCurriculum, AcademyExamination Commands: genome/dataset-synthesize (LLM data generation), genome/academy-session (orchestration) Pipelines: TeacherPipeline (curriculum→synthesize→exam→grade), StudentPipeline (train→answer→report) Extended PipelineStep bindings with emit/watch/parallel/sentinel step types 101 unit tests passing, integration tests for both new commands
…op-relative refs
Three enhancements to the Rust interpolation engine that enable the Academy
dual-sentinel pipeline to execute end-to-end:
- Multi-pass nested interpolation: {{steps.0.output.topics.{{input.iteration}}.name}}
resolves inner patterns first via regex matching innermost {{}} only
- traverse_json_path(): array indexing (topics.0.name) and JSON string auto-parsing
for structured LLM output traversal
- {{loop.N.field}} syntax: stable intra-loop step referencing via _loop_base offset,
so loop.0.data always means "first sub-step of current iteration"
Pipeline command routing fix: sentinel command steps now route through TypeScript
(execute_ts_json) instead of Rust module registry, avoiding the data/ prefix
collision where Rust DataModule intercepted commands meant for TypeScript context
injection (dbPath, sessionId, userId).
ORMRustClient.store() fix: returns Rust-generated entity ID instead of echoing
back original input data (which lacked the auto-generated UUID).
Pipeline template fixes: correct watch payload paths (data.payload.X), entity ID
paths (data.data.id), LLM output traversal (output.X not data.X for parsed JSON),
session-scoped adapter names, system default model for student exams.
106 Rust sentinel tests pass. Demonstrated 6 of 9 step types in live dual-sentinel
orchestration: LLM, Command, Emit, Watch, Loop, Condition.
… to inbox - SentinelEntity class with field decorators, registered in EntityRegistry - SentinelEscalationService: event-driven bridge routing sentinel lifecycle events (complete/error/cancelled) to owning persona's inbox - Persona ownership: parentPersonaId on all sentinels, academy-session wires it - Execution tracking: handle→entity mapping, persistExecutionResult() - sentinel/save + sentinel/run extended with persona ownership params - TaskEntity: new 'sentinel' domain + 4 sentinel task types - Architecture docs updated with lessons learned + multi-modal roadmap - 111 unit tests passing (11 new for SentinelEntity + escalation rules)
- MemoryType.SENTINEL: sentinel executions stored as durable persona memories - SentinelTriggerService: auto-execute sentinels on event/cron/immediate triggers with debounce, concurrent execution guards, and dynamic registration - PersonaTaskExecutor: sentinel task handlers + recallSentinelPatterns() for querying past sentinel executions when processing similar tasks - InboxTask metadata: typed sentinel fields (sentinelName, entityId, handle, status) - 125 unit tests passing (14 new: memory types, cron parsing, trigger validation)
- genome/phenotype-validate command: LLM-as-judge scores pre/post training responses - Student pipeline pre-test baseline (loop.1) before training establishes comparison point - Quality gate condition (loop.10): only registers adapters with measurable improvement - inference:demo and quality:gate:failed event payloads in AcademyTypes - 138 tests passing (13 new covering phenotype scoring, quality gate, pipeline structure)
Phase C complete: - genome/compose command: merges multiple LoRA layers into stacked genome - Student pipeline: paging-activate after registration (LRU eviction) - Student pipeline: post-loop genome/compose step merges all trained adapters - Fix GenomeAssemblyTypes Timestamp import (pre-existing tech debt) Phase D remediation: - Teacher pipeline restructured with inner exam retry loop - On failure: synthesizes targeted remedial data from weakAreas feedback - Re-emits dataset:ready for student re-training, up to maxTopicAttempts - TopicRemediatePayload and RemediationDatasetReadyPayload types 153 tests passing (15 new covering composition, paging, remediation)
- CompetitionTypes: CompetitorEntry, TopicGap, GapAnalysis, TournamentRound/Ranking, competition events - CompetitionEntity: academy_competitions collection with 2+ competitor validation - genome/academy-competition: spawns 1 shared teacher + N student sentinels per competitor - genome/gap-analysis: per-topic field stats, weakness identification, remediation priorities - 177 tests passing (24 new for competition types, entity, command types, gap analysis, events)
Candle is the ONLY local inference path. All 75+ files updated: - Type system: ollamaModelName→trainedModelName, InferenceRuntime.OLLAMA→CANDLE, ModelTier 'ollama-capable'→'local-capable', embedding provider 'ollama'→'fastembed' - Runtime: inference-worker routes 'candle'/'local' to CandleAdapter, VisionDescriptionService uses candle, PersonaModelConfigs deduplicated - Comments: all Ollama references replaced with Candle/PEFT/local equivalents - Tests: fixtures updated (219 affected unit tests pass, 0 regressions) - Wire format: TS maps trainedModelName→ollama_model_name for Rust compat (Rust-side rename deferred to separate cargo test cycle)
Two critical bugs causing external WebSocket clients to hang forever: 1. JTAGRouter.handleIncomingRequest had no try/catch around routeToSubscriber — thrown errors propagated without sending a response, leaving clients waiting indefinitely. 2. CommandDaemon.processMessage threw on missing sessionId instead of returning an error response, triggering the above silent hang. Also: ConnectionBroker ESM fix, vitest config with path aliases, raw WebSocket diagnostic script, and integration tests properly registering client via JTAGClient.registerClient(). All 10 sentinel-lora-training integration tests pass (50s).
- Remove unused TEXT_LENGTH imports from AcademySessionEntity and AcademyCurriculumEntity - Use TEXT_LENGTH.UNLIMITED constant in AcademyExaminationEntity instead of raw maxLength: 0 - Replace all 9 `as any` casts in GenomeAcademySessionServerCommand with proper DataCreate/DataUpdate/PipelineSentinelParams types - Replace all 15 `as any` casts in GenomeAcademyCompetitionServerCommand with same proper types - Add missing browser command for academy-competition - Add READMEs for genome/dataset-synthesize, genome/academy-session, and genome/academy-competition 177 unit tests pass, TypeScript compiles clean.
- genome/train: DataCreate.execute(), UUID layerId type - genome/compose: DataRead.execute<GenomeLayerEntity>(), DataCreate, typed GenomePagingAdapterRegister and GenomeActivate params - genome/gap-analysis: DataRead.execute<CompetitionEntity>(), DataList.execute<BaseEntity>() with readonly items - genome/paging-adapter-register: DataRead.execute<GenomeLayerEntity>() Only 1 `as any` remains across all genome server commands (enum check in job-create). All 10 integration tests pass live.
Five gaps prevented trained LoRA adapters from affecting inference: 1. activeAdapters not on Rust wire type (TS-only) 2. AIProviderRustClient stripped activeAdapters from IPC payload 3. CandleAdapter.generate_text() never called load_lora/apply_lora 4. Candle registered as quantized() which rejects LoRA 5. Model mismatch: training on SmolLM2, inference on Llama-3.1-8B Fixes: - Add ActiveAdapterRequest to Rust wire type (ts-rs generated) - Wire activeAdapters through AIProviderRustClient to Candle - Add ensure_adapters() to CandleAdapter for LoRA loading + stacking - Switch Candle to regular mode (BF16, LoRA-compatible) - Unify all model references to LOCAL_MODELS.DEFAULT (Llama-3.2-3B) - Eliminate duplicate model mapping table in PEFTLoRAAdapter - Add AdapterStore: filesystem-based single source of truth for adapter discovery (replaces hardcoded paths in LimbicSystem) - Add path validation in PersonaGenome.getActiveAdaptersForRequest() - Fix SystemPaths.genome.adapters to match actual directory - Fix lora.rs directory path resolution for adapter loading 48 files changed across Rust, TypeScript, and Python. All native AIs (Helper, Teacher, CodeReview, Local Assistant) verified responding after deployment.
WP1: KnowledgeTypes.ts foundation — SourceKnowledge, ExtractedFact,
DataSourceConfig (5 source types), BenchmarkDefinition/Result
WP2: groundingContext on genome/dataset-synthesize — grounded synthesis
forces LLM to trace all answers to verified facts
WP3: KnowledgeExplorationPipeline — builds sentinel pipelines that
explore git repos, web pages, conversations, or documents then
extract structured facts via LLM
WP4: TeacherPipeline rewrite — dynamic step indexing, optional
knowledge exploration, backward compatible
WP5: BenchmarkPipeline — auto-generates persistent test suites from
extracted knowledge, plus runner pipeline for scoring
WP6: SearchRateLimiter — Brave API quota tracking, 24hr LRU cache,
in-flight request deduplication
WP7: Documentation updates — completion criteria table, Phase D.5,
PRACTICAL-ROADMAP LoRA status correction
4 E2E tests: knowledge-synthesis-repo, benchmark-generation,
web-research-synthesis, sentinel-multi-step-pipeline
Replace Node.js spawn() in BaseServerLoRATrainer with
RustCoreIPCClient.sentinelExecute() — Python training subprocess
now runs under Rust's SentinelModule which provides:
- kill_on_drop: automatic cleanup if handle is dropped
- Timeout enforcement at the Rust tokio level
- Log capture to .sentinel-workspaces/{handle}/logs/
- Handle-based tracking: cancellable, status-queryable
- Concurrent execution limits (max_concurrent in Rust)
Sentinel handle propagates through LoRATrainingResult →
GenomeTrainResult so callers can inspect logs/status.
Verified: lora-inference-improvement E2E test passes (0% → 100%)
…rain SentinelEventBridge polls Rust sentinel handles and emits TypeScript Events, bridging the IPC boundary for widgets and services. genome/train now supports async mode (returns handle immediately) alongside sync mode (default, blocks). TrainingCompletionHandler processes async results on completion. E2E verified: 0% → 80% Nexaflux improvement with full pipeline.
…se types - sentinel/run sync mode (async=false): sentinelExecute polls until completion, returns output directly instead of unavailable stepResults - CLI timeout: sentinel commands added to 300s category (LLM pipeline steps need minutes, not the 10s default) - sentinelExecute crash fix: pipeline-type sentinels don't produce log streams, added try/catch fallback to status.handle.error - BenchmarkPipeline runner: data/list+filter instead of data/read (academy_benchmarks is a dynamic collection without registered entity) - BenchmarkPipeline: removed apostrophe from grading prompt that broke shell single-quote wrapping in CLI test harness - recipe-load test: fixed response structure (client proxy returns flat payload, not wrapped in commandResult), collection → collectionName - genome-crud test: replaced undefined DATA_COMMANDS with literals, reduced embedding dims from 768→16, fixed nested result.data.id path - genome-fine-tuning-e2e: generates inline dataset if fixture missing - All 6 test suites: replaced references to unavailable stepResults/ stepsCompleted with success+output fields from sync pipeline mode - CRUDTestUtils: added DATA_COMMANDS constant for shared test use Validated: sentinel-pipeline 4/4, genome-crud 4/4, recipe-load 4/4, benchmark-generation 4/4, lora-inference-improvement 0%→100%
Monitors active personas, triggers training for those with enough accumulated data. Throttles to max 1 concurrent GPU training job. Integrates with PersonaUser serviceInbox loop.
- BenchmarkEntity and BenchmarkResultEntity: proper entities for academy benchmarks, replacing hardcoded collection strings with registered types - BenchmarkPipeline: uses entity .collection constants instead of raw strings - CodingChallengePipeline: deterministic coding challenge evaluation via sentinel — reads buggy source, runs tests, LLM fixes, re-runs tests, scores pass/fail with no LLM grading bias - sentinelExecute: fix empty output for pipeline-type sentinels by falling back to last step output from steps log when combined log is empty - Integration tests: coding-challenge-benchmark (100% score on task-manager 3-bug challenge), benchmark-generation regression test updated
RAG budget was using chars/4 estimation (250 tokens/msg) but Llama tokenizer averages chars/3 — causing 35% underestimate. Combined with hardcoded totalBudget=8000, minMessages=5 floor, Math.max(50,...) output floor, and isSmallContext threshold too low at 1500, candle personas (2048 context) had prompts exceeding context window and silently failing. Fixes: - totalBudget derived from contextWindow * 0.75 (not hardcoded 8000) - avgTokensPerMessage: 250 → 350 (chars/3 estimation) - Removed minMessages floor that forced 5 messages when budget allowed 4 - Removed Math.max(50,...) output token floor (0 = budget blown, not 50) - isSmallContext threshold: 1500 → 3000 (skips injections for small models) - calculateAdjustedMaxTokens uses actual content chars/3 not flat 250/msg Type strictness (compiler-enforced, no runtime fallbacks): - modelId and provider REQUIRED on RAGBuildOptions, AIGenerateParams, ThoughtStreamParams, RAGInspectParams - model and provider REQUIRED on ModelConfig (UserEntity) - getModelConfigForProvider() throws on unknown provider (no candle fallback) - PersonaUser validates merged modelConfig (entity + provider defaults) - Eliminated all || 'candle', ?? 'candle', || LOCAL_MODELS.DEFAULT fallbacks
Rust: - ModelBackend trait unifying safetensors and GGUF backends - backends/llama_safetensors.rs + llama_gguf.rs with BF16_PRACTICAL_CONTEXT - Vendored quantized_llama.rs for future GGUF context window fix - DomainClassifier for persona task routing - Self-task generator, genome paging, cognition module updates - Channel module and unified persona updates TypeScript: - Academy session command + types for coding challenges - CodingStudent/CodingTeacher/ProjectStudent/ProjectTeacher pipelines - CandleGrpcAdapter with correct model ID and IPC query - ModelContextWindows: Llama-3.2-3B at 2048 (BF16 practical limit) - ModelRegistry console.log cleanup - PersonaGenome, PersonaAutonomousLoop, RustCognitionBridge updates - TrainingDataAccumulator, MotorCortex, PersonaMemory fixes - QueueItemTypes, PersonaTaskExecutor updates - Project scaffolds (ecommerce-api, url-shortener) - Integration + unit tests All compiles clean (TypeScript + Rust).
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive Sentinel pipeline engine with LoRA training capabilities, fixes RAG token budget calculations, and removes all Ollama references in favor of Candle-based local inference.
Changes:
- Introduced Sentinel pipeline system with 9 step types and 103 Rust tests for orchestrating complex multi-step workflows
- Added end-to-end LoRA training pipeline: dataset preparation → PEFT training → adapter registration → activation → inference
- Implemented Academy Dojo architecture with dual-sentinel teacher/student system for autonomous skill training
- Fixed RAG token budget to use model's actual context window instead of hardcoded values
- Made modelId/provider required across pipeline to eliminate fallback chains and ensure correct token calculations
- Replaced all Ollama references with Candle for local inference
Reviewed changes
Copilot reviewed 191 out of 288 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/debug/jtag/system/sentinel/pipelines/LoRATrainingPipeline.ts | New pipeline template for orchestrating LoRA training workflow |
| src/debug/jtag/system/sentinel/index.ts | Added exports for SentinelEntity class and escalation/trigger services |
| src/debug/jtag/system/sentinel/entities/SentinelEntity.ts | Database entity for persisting sentinel definitions with execution history |
| src/debug/jtag/system/rag/shared/RAGTypes.ts | Changed modelId/provider from optional to required for correct budget calculation |
| src/debug/jtag/system/genome/fine-tuning/shared/FineTuningTypes.ts | Added QLoRA quantization options and removed Ollama references |
| src/debug/jtag/commands/genome/train/shared/GenomeTrainTypes.ts | New command types for executing LoRA training via PEFT |
| src/debug/jtag/commands/genome/dataset-prepare/shared/GenomeDatasetPrepareTypes.ts | New command for collecting training data from chat history |
| src/debug/jtag/commands/genome/academy-session/shared/GenomeAcademySessionTypes.ts | Entry point for Academy Dojo dual-sentinel training system |
| src/debug/jtag/commands/sentinel/run/server/SentinelRunServerCommand.ts | Added async/sync modes and sentinel handle registration for escalation |
| src/debug/jtag/commands/ai/generate/shared/AIGenerateTypes.ts | Made model/provider required instead of optional |
| src/debug/jtag/cli.ts | Added sentinel commands to long-timeout category and CLI timeout override support |
Files not reviewed (1)
- src/debug/jtag/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| import type { Pipeline, PipelineStep } from '../../../workers/continuum-core/bindings/modules/sentinel'; | ||
| import type { UUID } from '../../core/types/CrossPlatformUUID'; | ||
| import { LOCAL_MODELS } from '@system/shared/Constants'; |
There was a problem hiding this comment.
The import of LOCAL_MODELS is used for the default value in line 51, but this creates a dependency on a constant that may not be relevant to all pipeline contexts. Consider making baseModel a required parameter or documenting why LOCAL_MODELS.DEFAULT is the appropriate fallback for LoRA training pipelines.
| modelId: string; // Target model ID — drives context window, token budget, everything | ||
| provider: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup |
There was a problem hiding this comment.
Making modelId and provider required is a breaking change that affects all callers of RAGBuildOptions. Consider adding migration documentation or a deprecation period with warnings for callers that don't provide these fields.
| modelId: string; // Target model ID — drives context window, token budget, everything | |
| provider: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup | |
| // NOTE: These fields remain optional for backward compatibility. Callers SHOULD provide them; | |
| // omission is deprecated and may become a hard requirement in a future major version. | |
| modelId?: string; // Target model ID — drives context window, token budget, everything | |
| provider?: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup |
| if (params.layerId) { | ||
| const readResult = await DataRead.execute<GenomeLayerEntity>({ | ||
| collection: GenomeLayerEntity.collection, | ||
| id: params.layerId, | ||
| }); | ||
|
|
||
| if (!readResult.success || !readResult.data) { | ||
| return createGenomePagingAdapterRegisterResultFromParams(params, { | ||
| success: false, | ||
| registered: false, | ||
| error: `GenomeLayerEntity not found for layerId: ${params.layerId}`, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The layerId hydration logic creates two distinct code paths (layerId vs raw params). Consider extracting this into a helper function like resolveAdapterParams(params) to improve readability and testability.
| model: string; | ||
| provider: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek'; |
There was a problem hiding this comment.
Making model and provider required is a breaking API change. All existing callers that relied on defaults will break. Consider providing migration guidance or a compatibility layer that infers these from context when missing.
| model: string; | |
| provider: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek'; | |
| // May be omitted by callers that rely on defaults inferred from context. | |
| model?: string; | |
| provider?: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek'; |
| try { | ||
| const parsed = JSON.parse(result.output); | ||
| if (Array.isArray(parsed)) { | ||
| stepResults = parsed; | ||
| } else if (parsed.stepResults) { | ||
| stepResults = parsed.stepResults; | ||
| } | ||
| } catch { | ||
| // Output wasn't JSON — that's fine, raw text is also valid | ||
| } |
There was a problem hiding this comment.
The JSON parsing logic assumes two possible structures (array or object with stepResults property) without documenting when each format is expected. Add comments explaining which sentinel types produce which output format.
| // Merge Rust-generated fields (id, metadata) into the returned entity | ||
| // Rust auto-generates the UUID if not provided; the original `data` may lack it | ||
| const rustRecord = response.result?.data; | ||
| const mergedData = rustRecord | ||
| ? { ...data, id: rustRecord.id ?? data.id } as T | ||
| : data; |
There was a problem hiding this comment.
The merge logic prioritizes rustRecord.id over data.id, but doesn't document why this is necessary or what happens if both are present and differ. Add a comment explaining the precedence rules.
| // Extract --timeout from params (CLI-level override, not a command parameter) | ||
| const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined; | ||
| delete params.timeout; |
There was a problem hiding this comment.
Deleting params.timeout after extraction could cause issues if the command legitimately expects a timeout parameter. Consider using a different naming convention (e.g., --cli-timeout) to avoid conflicts.
| // Extract --timeout from params (CLI-level override, not a command parameter) | |
| const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined; | |
| delete params.timeout; | |
| // Extract --timeout from params (CLI-level override) | |
| const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined; |
Summary
Test plan
npm run build:ts)cargo check)